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Abstract 

Fixed-  and  free-knot  least-squares  data  approximation  by  polynomial  splines  is  con- 
sidered. Classes  of  knot-placement  algorithms  are  discussed.  A practical  example  of  knot 
placement  is  presented,  and  future  possibilities  in  free-knot  spline  approximation  are 
addressed. 


1 Introduction 

The  representation  of  univariate  polynomial  splines  in  terms  of  B-splines  is  reviewed 
(Section  2),  leading  to  the  problem  of  obtaining  fixed-  and  free-knot  (2  spline  approxim- 
ations (Section  3).  The  accepted  approach  to  the  fixed-knot  case  is  recalled  (Section  4) 
and  the  manner  in  which  spline  uncertainties  can  be  evaluated  given  (Section  5).  The 
importance  of  families  of  spline  approximants  is  emphasised  (Section  6).  The  free-knot 
problem  is  formulated  (Section  7)  and  several  of  the  established  and  some  lesser-known 
knot- placement  strategies  reviewed  (Section  8).  Conclusions  are  drawn  and  future  pos- 
sibilities indicated  (Section  9). 

2 Univariate  polynomial  splines 

Let  I :=  [a;min,a;max]  be  an  interval  of  the  .r-axis,  and  ,rmin  = Ao  < Ai  < A2  < • • • < 
Ajv-i  < A n < Ajv+i  = rmax  a partition  of  I.  A spline  s(a:)  of  order  n (degree  n—  1)  on  / is 
a piecewise  polynomial  of  order  n on  (Xj,  AJ+] ),  j = 0, . . . , N.  The  spline  s is  at 

A j if  card  (A/i  = A j,£  € {1, . . . ,n})  = k.  The  partition  points  A = { A j } )v  are  the  (interior) 
knots  of  s.  To  specify  the  complete  set  of  knots  needed  to  define  s on  I in  terms  of  B- 
splines,  the  knots  {Aj}^  are  augmented  by  knots  {Aj}J"in  and  {AJ}9v+2.  q = N + n, 
satisfying 

Al-n  < • • • < Ao,  A/v+1  < • • ■ < A q. 

For  many  purposes,  a good  choice  [10]  of  additional  knots  is 

Ai_„,  = • • • = A0,  A/v+i  = • • • = Xq. 
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It  readily  permits  derivative  boundary  conditions  to  be  incorporated  in  spline  approxi- 
mants  [7].  On  I,  s(x)  has  the  B-spline  representation  [5] 

9 

s(x)  :=  s(c,X]x)  = '^2cjNnJ( X;x),  (2.1) 

j= 1 

where  Nnj(X\  x ) is  the  B-spline  [5, 12]  of  order  n with  knots  {A k}]-n  and  c = (ci, . . . , c9)T 
are  the  B-spline  coefficients  of  s.  Each  Nnj( A;  x)  is  a spline  with  knots  A,  is  non-negative 
and  has  compact  support.  Specifically, 

Nn,j(X-,x)  >0,  x G ( Aj — n , Xj ) , supp(Al„>JI(A,  a;))  = [A^_  ,, . Aj] . (2.2) 

The  B-spline  basis  {l\r„;j  (A;x)}j=1  for  splines  of  order  n with  knots  A is  generally  very 
well-conditioned  [10].  Moreover,  the  basis  functions  for  any  x G [xmin , xmax]  can  be 
formed  in  an  unconditionally  stable  manner  using  a three-term  recurrence  relation  [5, 12]. 
Specifically,  the  relative  errors  in  the  values  fl(Nnj(  A;  x))  of  the  basis  function  computed 
using  IEEE  floating-point  arithmetic  [18]  satisfy 

\fl(Nn,j(X]  x))  - NnJ(X;x)\  < CnNntj(X]  x)p, 

where  C is  a constant  that  is  a small  multiple  of  unity  and  rj  is  the  unit  roundoff  of  the 
floating  point  processor  [5] . The  B-spline  basis  for  splines  of  order  3 with  interior  knots 
at  x = (1, 2,  5)t  and  coincident  end  knots  at  x = 0 and  10,  is  shown  in  Figure  1. 


Fig.  1.  The  B-spline  basis  for  splines  of  order  3 for  some  nonuniformly  spaced  knots. 
The  first  three  B-spline  basis  functions  are  shown  as  solid  lines  and  the  remaining  three 
as  dotted  lines. 

Valuable  properties  of  s can  be  deduced  [12]  from  those  of  the  B-splines.  A useful 
property  is  that,  for  any  x 6 /,  s(x)  is  a convex  combination  of  the  coefficients  of  the 
B-splines  whose  support  contains  x.  Thus,  local  bounds  for  s can  readily  be  found: 

min  Cfc  < s(x)  < . max  ct,  x G [A,-,A,-+il. 

j<k<j+n  j<k<j-j~n 

These  bounds  imply  a mimicking  property  for  s,  viz.,  that  the  elements  of  c tend  to 
vary  in  much  the  same  way  that  s varies.  Figure  2 depicts  a spline  curve  s of  order 
4 with  “non-polynomial”  shape  having  interior  knots  at  x = (1,2, 5)T,  coincident  end 
knots  at  x = 0 and  10,  and  B-spline  coefficients  (0.00,0.20,0.60, 0.22, 0.18, 0.14,0. 12)T. 
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To  reproduce  this  shape  to  visual  accuracy  with  a polynomial  would  require  a high 
degree  and  hence  many  more  defining  coefficients.  The  mimicking  property  is  evident: 
successive  elements  of  c rise,  fall  sharply  and  then  gently,  behaving  in  a similar  way  to  s. 


Fig.  2.  A spline  curve  with  “non-polynomial”  shape  illustrating  the  mimicking  prop- 
erty. 


3 Fixed-  and  free-knot  approximation 

Two  types  of  data  approximation  (or  data  modelling)  in  the  l2  norm  by  splines  are 
regularly  considered.  One  is  the  determination  of  the  B-spline  coefficients  c for  given 
data,  a prescribed  order  n and  prescribed  knots  A.  The  other  is  the  determination  of  c 
and  A for  given  data  and  spline  order  n.  The  former  problem  is  linear  with  respect  to 
the  parameters  of  the  spline,  just  c being  regarded  as  unknown.  The  latter  is  nonlinear, 
both  c and  A being  unknown. 

The  linear  case  is  well  understood,  with  highly  satisfactory  algorithms  [10]  and  soft- 
ware implementations  [1,  16]  available.  The  nonlinear  case  remains  a research  problem, 
although  useful  algorithms  (Section  8)  have  been  proposed,  implemented  and  used.  Many 
of  these  algorithms  “iterate”  with  respect  to  A,  where  for  each  choice  of  knots  the  res- 
ulting linear  problem  is  solved  for  c.  Thus,  the  linear  problem  (Section  4)  is  important 
in  its  own  right  and  as  part  of  the  solution  strategy  for  knot-placement  algorithms. 

4 Least-squares  data  approximation  by  splines  with  fixed  knots 

The  £2  data  approximation  problem  for  splines  with  fixed  knots  can  be  posed  as  follows. 
Given  are  data  points  {(aq, a/,)}™)  with  X\  < ■■■  < xm , and  corresponding  weights 
{w.(}™  or  standard  uncertainties  {«,}”’ . The  v),  reflect  the  relative  quality  of  the  y,}  m 
is  the  standard  uncertainty  of  y,  and  corresponds  to  the  standard  deviation  of  possible 
“measurements”  at  x = aq  of  the  function  underlying  the  data,  y,  being  one  realisation. 
Given  also  are  the  N knots  A = and  the  order  n of  the  spline  s. 

When  weights  are  specified,  the  problem  is  to  determine  the  spline  s(x)  of  order 
n,  with  knots  A,  such  that  the  two-norm  of  {t/qc*}™  is  minimised  with  respect  to  c. 


1The  Xi  are  taken  as  exact  for  the  treatment  here.  A generalised  treatment  is  possible,  in  which  the  X{  are  also 
regarded  as  inexact.  The  problem  becomes  nonlinear  (in  c). 
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When  standard  uncertainties  are  specified,  the  two-norm  of  {u~1el  j'"‘  is  minimised  with 
respect  to  c.  If  Wi  — u~[ 1 , i = 1, . . . ,m,  the  two  formulations  are  identical  in  terms  of  the 
spline  produced.  When  weights  are  specified,  s is  referred  to  as  a spline  approximant. 
When  uncertainties  are  prescribed,  s is  known  as  a spline  model.  There  are  differences 
(Section  5)  in  interpretation  in  terms  of  the  statistical  uncertainties  associated  with  the 
solution  and  in  terms  of  validating  the  spline  model  so  obtained. 

The  use  of  a formulation  in  terms  of  standard  uncertainties,  together  with  the  B-spline 
representation  (2.1)  of  s,  gives  the  linear  algebraic  formulation2 

mineTV^T1e,  e = y - Ac,  (4.1) 

where  y = (yi, , ym)T,  A is  an  m x q matrix  with  — Nnj(xt),  and  Vy  = 
diag (u2, . . . ,ufn).  Matrix  computational  methods  can  be  applied  to  this  formulation. 
As  a consequence  of  property  (2.2)  of  the  B-splines,  A is  a rectangular  banded  matrix 
of  bandwidth  n [8]. 

The  linear  algebraic  solution  can  be  effected  using  Givens  rotations  to  triangularise 
the  system,  back-solution  then  yielding  the  coefficients  c [6].  The  number  of  floating- 
point operations  (flops)  required  is  to  first  order  0(mn2),  i.e.,  independent  of  the  number 
of  knots.  Hence  computing  a spline  model  for  many  knots  is  hardly  more  expensive  than 
one  for  a few  knots.  Moreover,  since  for  many  problems  cubic  splines  ( n = 4)  yield  a 
good  balance  between  approximation  properties  and  smoothness  (continuity  class  C2), 
regarding  the  order  as  fixed  gives  a flop  count  0(m). 

The  vector  c is  unique  [11]  if  there  is  a strictly  ordered  subset  t = {tj}\  of  x such 
that  the  Schoenberg- Whitney  conditions  [21] 

tj  £ supp(iVn  j ( A; x)),  j = l,...,q,  (4.2) 

hold.  In  a case  where  the  conditions  (4.2)  do  not  hold3,  an  appropriate  member  can  be 
selected  from  the  space  of  possible  solutions.  Such  a selection  is  also  advisable  if  the 
conditions  are  in  a practical  sense  “close”  to  being  violated.  A particular  solution  can 
be  determined  by  augmenting  the  least-squares  formulation  by  a minimal  number  of 
equality  constraints  for  c such  that  A has  full  column  rank  [10]. 

An  instance  of  the  type  of  data  set  to  which  the  algorithms  of  this  paper  are  addressed 
is  shown  in  Figure  3:  Such  a data  set  (cf.  Section  2)  has  the  variety  of  behaviour  that 
cannot  readily  be  reproduced  by  some  other  classes  of  approximating  functions. 

5 Spline  uncertainties 

Once  a valid  spline  model  has  been  obtained,  the  uncertainties  associated  with  the 
spline  can  be  evaluated  [9].  Uncertainty  evaluations  are  essential  in  metrology,  where  all 
measurement  results  are  to  be  accompanied  by  a quantification  of  their  reliability  [2], 
and  important  in  other  fields.  The  key  entity  is  the  covariance  matrix  Vc  of  the  spline 


2 A further  generalisation  is  possible  in  which  mutual  dependencies  are  permitted  among  the  measurement  errors. 
In  this  case,  Vy  is  non-diagonal. 

3A  set  of  knots  giving  rise  to  this  circumstance  may  be  a consequence  of  an  automatic  knot-placement  procedure. 
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Independent  variable 

Fig.  3.  A data  set  representing  heat  flow  as  a function  of  temperature.  Such  data  forms 
the  basis  of  the  determination  of  thermophysical  properties  of  materials  under  test.  For 
clarity  only  every  fifth  data  point  is  shown. 

coefficients  c.  Using  recognised  procedures  of  linear  algebra, 

Uc  = (ATVy-1A)-1.  (5.1) 

From  this  result,  the  standard  uncertainty  of  any  quantity  that  depends  on  c can  be 
evaluated.  Specifically,  for  a given  constant  vector  p,  the  standard  uncertainty  w(pTc) 
of  pTc  is  given  by 

u2(pTc)  = pTUcp. 

By  setting  p to  contain  the  values  of  the  B-spline  basis  at  a point  x £ I,  the  standard 
uncertainty  of  s(x ) can  be  formed.  The  standard  uncertainty  of  a nonlinear  function  of 
c can  be  estimated  by  first  linearising  the  expression  about  the  solution  value  of  c. 

If  weights  rather  than  uncertainties  are  specified  for  the  data,  (5.1)  takes  the  form 

Vc=o2{AtW2A)-\ 

where  <r  estimates  the  standard  deviation  of  the  weighted  residuals  {wie;}™,  W = 
diag(uq, . . . ,wm),  and 

a2  = eTW2e  /(m  - q) 

evaluated  at  the  solution. 

6 Families  of  approximants 

When  dealing  with  certain  classes  of  approximating  function  it  is  natural  and  useful  to 
consider  families  of  approximants.  A simple  example  is  polynomial  approximation,  for 
polynomials  Pj(x)  of  order  j = 1, 2, . . . , N,  for  some  maximum  order  N.  Each  member 
of  the  family  “contains”  the  previous  member.  It  is  then  meaningful  to  consider  the 
approximation  measure,  e.g.,  the  £2-norm  here,  with  respect  to  indices  denoting  mem- 
bers. Thus,  the  value  of  the  t^-norm  for  the  polynomial  approximant  of  order  j can  be 
inspected  with  respect  to  index  j for  j = 1, 2, . . . , N.  For  data  approximation,  it  is  more 
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meaningful  to  use  as  the  measure  the  root-mean-square  residual  given  by  dividing  the 
^2-norm  by  (m  — j)1/2,  For  representative  data,  the  expectation  is  that  as  j increases 
this  quantity  should  stabilise  to  an  essentially  constant  value.  This  property  provides  a 
useful  validation  procedure.  If  weights  u~l  are  used  as  in  Section  4 this  measure  should 
settle  to  the  value  unity.  Thus  the  approximant  with  index  j (normally  the  smallest 
such)  that  achieves  the  value  one  is  sought. 

Within  most  of  the  strategies  outlined  in  Section  8 it  is  possible  to  produce  results 
for  N = 1, 2, . . . knots,  and  thus  to  study  the  effect  of  the  number  of  knots  on  the  quality 
of  the  approximant.  From  such  information  it  may  be  possible  to  select  an  acceptable 
solution.  If  for  each  number  of  knots,  the  knots  contain  those  for  the  previous  number, 
and  an  £2  approximant  is  determined,  the  sequence  of  approximants  for  N = 1,2,... 
knots  forms  a family.  A family  has  the  property  that  the  sequence  of  values  of  the 
f 2-norm  is  monotonically  decreasing. 

7 Least-squares  data  approximation  by  splines  with  free  knots 

The  problem  of  least-squares  data  approximation  by  splines  with  free  knots  can  be 
formulated  in  the  same  way  as  that  for  fixed  knots  (Section  4),  except  that  the  knots 
are  not  specified  a priori , either  in  location  or  number.  The  formulation  (4.1)  no  longer 
yields  a linear  problem,  since  the  matrix  A of  B-spline  values  is  now  a function  of  A. 
Instead,  e(A)  = y — A(A)c,  and  it  is  required  to  solve 

mineT(A)Fy-1e(A).  (7.1) 

A;c 

In  order  to  reflect  the  fact  that  for  any  given  knot  set  the  B-spline  coefficients  are  given 
by  solving  a relatively  simple,  linear  problem,  formulation  (7.1)  can  be  expressed  as 

min  ^mineT(A)V_y~1e(A)j  . (7.2) 

Extensive  use  is  made  of  this  elementary  result. 

8 Knot-placement  strategies 

Many  knot-placement  strategies  have  been  proposed  and  used.  Some  of  these  strategies 
are  outlined  and  their  properties  indicated.  Several  of  the  strategies  generate  a family  of 
candidate  spline  approximants,  with  advantages  for  model  validation. 

8.1  Manual  methods 

Manual  methods  can  be  classed  as  those  methods  for  which  the  user  examines  the  general 
“shape”  of  the  function  underpinning  the  data,  selecting  the  number  and  location  of  the 
knots  on  this  basis.  With  practice  and  visual  aids,  acceptable  solutions  can  often  be 
obtained  [6].  Naturally,  knots  are  chosen  to  be  more  concentrated  where  “things  are 
happening”  in  contrast  to  regions  where  the  underpinning  behaviour  is  innocuous. 

8.2  Strategies  that  depend  only  on  abscissa  values 

Strategies  based  on  the  manner  in  which  the  values  of  the  independent  variable  are 
distributed  may  be  used  to  place  the  knots  (at  points  that  are  not  necessarily  the  data 
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abscissae  themselves).  A facility  in  DASL  (the  NPL  Data  Approximation  Subroutine 
Library)  [1]  provides  one  such  strategy,  based  on  the  Schoenberg-Whitney  conditions 
(4.2)  in  the  following  way.  Intuitively,  these  conditions  imply  that  there  is  no  region  where 
there  are  “too  many”  knots  compared  with  the  number  of  data  points.  Mathematically , 
these  conditions  guarantee  uniqueness.  Numerically , their  satisfaction  does  not  ensure 
that  the  solution  is  well-defined.  If  the  conditions  are  “close”  to  being  violated,  c will  be 
sensitive  to  perturbations  in  the  data.  In  particular,  since  the  behaviour  of  c “controls” 
that  of  s (Section  2),  the  spline  is  likely  to  exhibit  spurious  behaviour  such  as  large 
undesirable  oscillations  if  ||c||2  » ||y||2. 

It  follows  that  a sensible  choice  of  knots  would  be  such  that  the  Schoenberg-Whitney 
conditions  are  satisfied  “as  well  as  possible”  for  a data  subset.  Such  a choice  is  made  in 
DASL  [1]  for  spline  approximation  of  arbitrary  order.  It  is  also  made  in  a cubic  spline 
interpolation  routine  in  the  NAG  Library  [16],  regarding  spline  interpolation  as  a special 
case  of  spline  approximation  in  which  q = m and  N — m — n.  The  choice  made  is  seen 
most  simply  by  first  applying  it  to  spline  interpolation.  Consider  the  choice 

A?  = 2 "b  ®j+l(n+i)/2j )>  j = 1, . . . ,m  — n, 

where  [uj  is  the  largest  integer  no  larger  than  v.  For  n even,  A j = Xj+n/ 2.  Thus,  the  choice 
tj  = Xj-n/2  would  be  made.  However  (Section  2),  supp (Nnj)  = [Aj_n,  Aj].  Thus,  index- 
wise,  the  Schoenberg-Whitney  conditions  are  satisfied  as  well  as  possible  in  the  sense 
that  the  index  of  A j-n/2  falls  halfway  between  the  indices  of  the  support  endpoints  Aj_„ 
and  \j . Comparable  considerations  apply  for  n odd.  Precisely  this  choice  is  recommended 
[14,  16]  in  the  context  of  cubic  spline  interpolation.  It  is  the  “not  a knot”  criterion,  as 
a practical  alternative  to  the  classical  use  of  boundary  derivatives.  A knot  is  placed  at 
each  “interior”  data  value  x*  apart  from  x2  and  xm_i. 

The  above  choice  can  be  interpreted  as  follows.  Consider  the  graph  x = F(£)  given 
by  the  join  of  the  points  {(Lx*)}]".  The  jth  interior  knot,  A j,  for  j = 1 , ...,m  — n, 
is  given  by  F(j  + n/2).  The  successive  spacings  between  the  index  arguments  of  F for 
j = 0, . . . , N + 1,  using  F( 0)  = xmin  and  F(N  + 1)  = xmax,  are  therefore 

1 + n/2, 1,...,  1,1  + n/2. 

N- 1 

For  approximation,  these  successive  spacings  are  proportionally  increased  to  account  for 
the  fact  that  there  are  fewer  knots.  The  resulting  expression  for  the  jth  interior  knot  is 

Aj  = F(1  + (m  — l)(j  + n/2  — l)/{q  - 1)),  j = 

The  choice  can  be  interpreted  as  placing  the  interior  knots  such  that  there  is  an  approx- 
imately equal  number  of  data  points  in  each  knot  interval  (interval  between  adjacent 
knots),  except  that  in  the  first  and  the  last  interval  there  are  approximately  n/2  times 
as  many  points.  The  strategy  [1]  has  the  property  that  when  N is  such  that  the  data  is 
interpolated,  the  choice  of  knots  agrees  with  one  of  the  recommended  choices  for  spline 
interpolation.4 

4 The  approach  tends  to  give  better  knot  locations  if  the  data  is  gathered  in  a manner  which  ensures  that  the  local 
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Figure  4 illustrates  the  above  strategy  for  a spline  interpolant  and  approximant  of 
order  4 to  data  with  abscissae  x = (0,0.25,0.5,0.75, 1, 1.25, 1.5, 1.75, 2, 3,4, 5,7.5, 10)T. 

Each  figure  shows  the  graph  x = F(£).  For  the  interpolant  (left-hand  graph),  ten  knots 
are  chosen  to  coincide  with  the  abscissa  values  X3, . . . , Xi2-  For  the  approximant  (right- 
hand  graph),  four  knots  are  chosen  such  that  there  are  two  points  in  each  interval, 
excepting  the  first  and  last  interval  where  there  are  four  points,  i.e.,  n/2  = 2 times  as 
many.  The  distribution  of  the  knots  reflects  that  of  the  abscissa  values. 


0 5 10  15  0 5 10  15 


Fig.  4.  A knot  placement  strategy  depending  only  on  the  abscissa  values. 

A simpler  strategy  is  to  select  uniformly  spaced  knots.  The  Schoenberg-Whitney 
conditions  will  not  necessarily  automatically  be  satisfied  by  such  a choice,  and  the  spline 
approximant  would  therefore  not  be  unique,  although  the  approach  indicated  at  the  end 
of  Section  4 could  be  applied. 

8.3  Sequential  knot-insertion  strategies 

In  a sequential  knot-insertion  strategy,  a succession  of  approximants  is  obtained,  in  which 
for  each  approximant  a knot  is  inserted  in  the  knot  interval  that  gives  rise  to  the  greatest 
contribution  to  the  i 2 error.  A knot  interval  is  an  interval  between  adjacent  knots,  where 
the  endpoints  of  I count  as  knots  for  this  purpose.  Previously  inserted  knots  are  retained 
undisturbed.  Several  variants  are  possible  (also  see  Section  8.10),  e.g.: 

• Start  the  process  with  a number  of  knots  already  in  place,  perhaps  obtained  from 
information  specific  to  the  application. 

• Candidate  positions  for  a new  knot  are 

* The  continuum  of  points  within  the  interval.  The  approach  gives  rise  to  the 
minimisation  of  a univariate  function  that  may  possess  local  minima. 

* The  subset  within  the  interval  of  a discrete  set  of  points  chosen  a priori , e.g.,  the 
data  abscissa  themselves  or  a uniformly  spaced  set  of  a;- values.  The  approach 

density  of  the  data  is  greater  in  regions  where  the  behaviour  of  y is  more  marked. 
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gives  rise  to  a finite  computation  for  the  globally-best  choice  of  knot,  relative 
to  the  discretisation,  with  respect  to  previous  knots. 

• More  than  one  knot  can  be  inserted  at  a time.  Doing  so  gives  an  approach  that 
is  intermediate  between  full  optimisation  (Section  8.6)  and  sequential  (single)  knot 
insertion.  Computation  times  rise  rapidly  with  the  number  of  “simultaneous”  knots 
so  inserted,  so  in  practice  only  a small  number,  say  two  or  three,  might  be  feasible. 

The  “upper  set”  of  crosses  in  Figure  5 shows  the  root-mean-square  residual  as  a function 
of  the  number  of  knots  for  the  application  of  this  strategy  to  the  thermophysical  data 
of  Figure  3. 


Fig.  5.  The  root-mean-square  residual  as  a function  of  the  number  of  knots  for  the 
application  of  knot-insertion  and  knot-removal  strategies  to  the  thermophysical  data  of 
Figure  3.  The  “upper  set”  of  crosses  indicate  the  values  obtained  for  knot  insertion  and 
the  lower  for  knot  removal.  The  knot-removal  strategy  starts  with  the  knot  set  provided 
by  the  knot-insertion  strategy,  which  was  terminated  after  81  knots  had  been  placed. 
The  figure  depicts  the  root-mean-square  residual  on  a logarithmic  scale,  so  its  value 
varies  by  a factor  of  1000  from  1 to  81  knots. 


8.4  Sequential  knot-removal  strategies 

In  a sequential  knot-removal  strategy,  the  starting  point  is  an  initial  spline  approximant 
having  a “large”  number  of  knots  that  typically  would  be  regarded  as  an  acceptable 
approximant  to  the  data  and  that  contains  (perhaps  many)  more  knots  than  desired.  Also 
see  Section  8.10.  Each  successive  approximant  is  obtained  from  the  previous  approximant 
by  deleting  one  (or  more)  knots.  The  knot  selected  for  removal  is  chosen  as  that  having 
least  effect  in  terms  of  the  change  in  the  error.  The  process  is  continued  until  an 
acceptable  approximant  is  no  longer  obtained. 

The  initially  large  number  of  knots  (Section  8.10)  provides  an  appreciable  number 
of  candidate  knots  for  removal  and  thus  greater  flexibility.  The  rationale  is  that  in 
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contrast  to  successive  knot  insertion  a succession  of  acceptable  approximants  is  obtained 
as  opposed  to  a succession  of  unacceptable  approximants,  until  the  final  “solution”  is 
provided.  There  are  variants,  as  with  sequential  knot  insertion.  For  example,  several 
knots  can  be  removed  at  each  stage. 

A different  class  of  knot  removal  algorithms  [20]  is  based  on  a general  class  of  £p  norms. 
It  is  not  concerned  Specifically  with  data  approximation,  but  with  replacing  an  initial 
spline  approximant  (that  may  correspond  to  an  approximant)  by  one  that  is  acceptably 
close  according  to  the  measure. 

The  “lower  set”  of  crosses  in  Figure  5 shows  the  root-mean-square  residual  as  a 
function  of  the  number  of  knots  for  the  application  of  this  strategy  to  the  thermophysical 
data  part-depicted  in  Figure  3. 

8.5  Theory-based  approaches 

The  distance  of  a spline  s(x)  with  knots  A from  a sufficiently  differentiable  function 
f(x)  is  proportional  to  /i"|/(”)  (£)|,  where  h is  the  local  knot  spacing  and  £ is  a value  of 
x [14].  Consider  inverting  this  expression  in  order  approximately  to  equalise  the  error 
with  respect  to  x.  The  lengths  of  the  knot  intervals  should  consequently  be  chosen  to  be 
proportional  to  |/(n)(£)|-1/n,  where  £ is  a value  in  the  neighbourhood  of  the  respective 
knot  interval.  Consider  the  function 

T®  / /*3?max 

F(x)  = I |/(”)(t)|1/ridt  / / |/(n)(f)|1/n<ft . (8.1) 

''Emin  • 

Take  knots  given  by 

F{Xj)  = iv+T  = * (8-2) 

This  result  corresponds  to  dividing  the  range  of  the  monotonically  increasing  function 
F(x),  for  x € I,  into  N + 1 contiguous  subranges  of  equal  length,  taking  the  values  of  x 
corresponding  to  the  subrange  endpoints  as  the  knots. 

In  practice  /,  let  alone  F,  is  unknown.  Various  efforts  have  been  made  to  estimate  / 
and  hence  F from  the  data  points.  For  instance,  if  the  data  is  approximated  by  a spline  of 
order  n+  1,  its  nth  derivative,  a piecewise-constant  function,  can  be  used  to  estimate  F 
[3].  It  is  then  straightforward  to  form  the  required  knots.  The  approach  begs  the  question 
in  the  case  of  data.  In  order  to  estimate  knots  for  a spline  of  order  n,  it  is  first  necessary 
to  construct  a spline  approximant  of  order  n -I- 1 for  the  data,  the  construction  of  which 
itself  requires  a choice  of  knots. 

Alternatively  [13],  a spline  approximant  of  order  n for  the  data  can  be  constructed 
for  some  convenient  choice  of  knots.  Its  nth  derivative  is  of  course  zero  (except  at  the 
knots).  However,  its  (n  — l)th  derivative  is  piecewise  constant,  a function  that  can  be 
approximated  by  the  join  of  the  mean  values  at  the  knots  of  the  constant  pieces  to  the 
immediate  right  and  left,  with  special  consideration  at  the  endpoints  of  L The  derivative 
of  this  piecewise-linear  function  then  provides  a piecewise-constant  representation  of  the 
nth  derivative,  that  can  be  used  as  before.  Knots  can  then  be  deduced  from  this  form 
as  above.  The  advantage  of  this  approach  is  that  it  can  be  iterated  [13].  If  the  process 
“converges”,  the  result  can  be  used  to  provide  the  required  knot  set.  The  process  can 
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work  well,  but  is  capable  of  producing  disappointing  results.  Several  variants  of  the  basic 
concept  are  possible.  The  approach  warrants  careful  re-visiting. 

8.6  “Overall”  optimisation  approaches 

For  any  given  value  of  N,  the  problem  is  regarded  as  an  optimisation  problem  with 
respect  to  the  overall  error  measure.  It  is  necessary  to  provide  a sensible  initial  estimate  of 
the  knot  positions.  Local  solutions  which  may  be  grossly  inferior  to  the  global  solution  are 
possible  [4],  At  an  optimal  solution,  knots  may  coalesce,  thus  reducing  the  continuity  of 
the  spline  at  such  points  [19];  the  same  comment  applies  to  the  sequential-knot-insertion 
and  optimisation  approach  (Section  8.7). 

8.7  Sequential  knot  insertion  and  optimisation 

Sequential  knot  insertion  with  optimisation  is  identical  to  the  sequential  knot-insertion 
strategy  (Section  8.3)  except  that,  after  each  knot  is  inserted,  all  previously-inserted 
knots  are  adjusted  such  that  the  complete  set  of  knots  at  that  stage  are  (locally)  optimal 
with  respect  to  the  overall  error  measure.  One  such  strategy  [15]  carries  out  the  optim- 
isation at  each  stage  by  adjusting  in  turn  each  knot  in  the  current  knot  set  in  order  to 
achieve  satisfactory  reduction  in  the  1 2 norm,  and  repeating  the  complete  adjustment  as 
necessary.  This  strategy  is  not  as  poor  as  the  traditional  one-variable-at-a-time  strategy 
for  nonlinear  optimisation  because  knots  far  from  the  newly-inserted  knot  tend  to  have 
little  effect  on  the  error  measure. 

Buffering  to  prevent  knots  coalescing  and  reducing  the  continuity  of  the  approximant 
can  be  used.  Various  features  can  be  incorporated  to  improve  computational  efficiency, 
including  the  use  of  contemporary  nonlinear  least-squares  optimisation.  It  is  emphasised 
that  for  each  choice  of  knots  the  problem  is  linear  (cf.  Section  7). 

8.8  Optimal  discontinuous  piecewise-polynomial  approximation 

Consider  the  class  Sjv  of  splines  having  N interior  knots  of  multiplicity  n (i.e.,  nN 
interior  knots  in  all,  counting  coincidences).  An  s € Sn  will  in  general  be  discontinuous 
at  these  knots.  It  is  possible  to  determine  the  globally  optimal  locations  of  such  knots, 
using  the  principle  of  dynamic  programming  [4],  The  approach  is  based  on  the  fact  that 
the  best  approximant  to  the  leading  p (>  nN)  data  points  is  given  by  the  best 

over  q = nN  - n + l,nN  - n -I-  2, ...  ,p  - N of  sjv-i  6 S)v-i  for  the  leading  q <p  - N 
points,  together  with  a polynomial  piece  of  order  n over  points  <7  + 1 to  p.  By  this  simple 
recursive  means  the  globally  best  knots  for  splines  of  any  order  that  are  discontinuous 
at  any  number  of  knots  can  be  computed. 

Such  a solution  may  not  be  suitable  as  the  final  result  in  an  application.  However, 
it  can  be  useful  as  part  of  a knot  placement  strategy.  For  example,  suppose  good  knots 
for  a spline  of  order  n are  required.  An  approach  would  be  to  determine  an  optimal 
discontinuous  spline  of  order  n+ 1.  Use  this  spline  to  estimate  / in  expression  (8.1).  The 
integral  in  the  numerator  of  (8.1)  will  be  continuous  piecewise  linear  and  estimates  of 
the  optimal  knots  for  a C(n“2^  spline  readily  obtained  from  (8.2).  Mixed  results  have 
informally  been  obtained  by  the  authors  with  an  implementation  of  this  approach.  It  is 
suggested  that  it  be  revisited. 
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8.9  Knot  dispersion 

A set  of  knots  of  multiplicity  n is  positioned  using  an  appropriate  strategy,  such  as 
that  in  Section  (8.8)  and  a C^-1)(/)  spline  with  these  knots  determined.  Each  of  these 
multiple  knots  is  “dispersed” , viz.,  replaced  by  n nearby  simple  knots,  and  a replacement 
C(n_2)(I)  spline  computed.  A careful  strategy  for  knot  dispersion  is  required.  Again, 
informal  experiments  have  been  made  by  the  authors  and  mixed  results  obtained. 

8.10  Knot  initialisation  and  candidate  knot  locations 

Several  of  the  above  procedures  require  or  can  benefit  from  an  initial  placement  of  the 
knots.  Some  make  use  of  “candidate  knot  locations”. 

The  solution  to  the  free-knot  spline  approximation  problem  returned  by  iterative 
algorithms  typically  depends  on  the  starting  set  of  knots.  Although  an  algorithm  may 
return  a result  that  satisfies  the  necessary  and  sufficient  conditions  for  a solution  [17] , this 
result  may  be  locally  rather  than  globally  optimal.  There  is  no  known  characterisation  of 
a globally  optimal  solution.  The  careful  interpretation  of  solutions  is  therefore  important. 

The  use  of  candidate  knot  positions  can  be  helpful.  For  instance,  it  may  be  decided 
that  for  splines  of  even  order,  only  knots  that  coincide  with  data  abscissae  are  in  the 
candidate  set,  or,  for  splines  of  odd  order,  knots  only  at  points  mid-way  between  adja- 
cent data  abscissae  may  be  so  regarded.  Such  criteria  are  consistent  with  the  choice  for 
interpolating  splines  and  the  generalisation  covered  in  Section  8.2.  The  Lyche-Mprken 
knot  removal  algorithms  [20]  use  data  abscissae  as  candidate  knots.  The  use  of  a finite 
number  of  candidate  knot  locations  helps  to  reduce  the  dimensionality  of  the  problem: 
there  can  then  only  be  a finite  number  of  possible  knot  sets.  For  large  N this  number  can 
be  extremely  large,  making  it  prohibitive  to  examine  all  possibilities.  However,  for  small 
N,  e.g.,  1,  2 and  3,  it  may  indeed  be  possible,  and  can  pay  dividends.  Knot  insertion 
and  knot  removal  algorithms  can  also  implement  the  concept.  For  example,  at  each  stage 
of  a knot  insertion  strategy,  two  or  three  knots  can  be  inserted  “simultaneously”.  By 
the  method  of  their  introduction  these  new  knots  will  be  optimal  relative  to  the  knots 
previously  used  and  the  available  candidate  knot  locations. 

Another  aspect  of  a candidate  knot  set  is  that  if  it  is  sufficiently  dense  it  will  contain, 
to  a degree  of  approximation  dictated  by  its  “spacing” , the  optimal  knots  for  the  given 
data  set  [19].  For  instance,  consider  a set  ofm  > 100  data  points  specified  over  an  interval 
I normalised  to  [—1,1].  Take  100  uniformly  spaced  points  spanning  this  interval.  This  set 
will  contain,  to  approximately  two  figures,  each  globally  optimal  knot  set  having  N < 98 
knots5  (assuming  all  knots  are  simple).  If  a spline  based  on  these  98  candidate  interior 
knots  provided  a valid  model,  a suitable  knot  removal  algorithm  might  be  expected  to  be 
able  to  identify  reasonably  closely  the  optimal  knot  sets.  Work  is  required  to  determine 
the  degree  of  success  in  this  regard. 

9 Conclusions,  discussion  and  future  possibilities 

There  are  theoretical  difficulties  associated  with  existence,  uniqueness  and  characterisa- 
tion of  best  free-knot  £ 2 spline  approximants,  which  influence  practical  considerations. 

5The  two  endpoints  do  not  constitute  interior  knots. 
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A best  spline  in  the  class  of  splines  required  may  not  exist.  Take  as  {xj}™,  m = 21 
uniformly  spaced  values  in  [— 1, 1]  and  y,  = jx,j3.  To  see  that  a best  £2  spline  s of  order  4 
with  three  interior  knots  for  this  data  may  not  exist,  consider  the  choice  Aj  = — e,  A2  = 0 
and  A3  = e.  The  i2  error  can  be  made  smaller  than  any  given  5 > 0 for  some  e > 0. 
However,  if  the  i2  error  is  made  zero  by  the  choice  e = 0,  the  resulting  three  coincident 
knots  at  x = 0 mean  that  s has  lower  continuity  than  the  class  of  splines  considered.  In 
practice,  allowing  knots  to  come  “too  close”  together  can  introduce  undesirable  “sharp- 
ness” into  the  approximant.  Buffering  of  knots  [15],  to  ensure  a minimal  separation  helps 
in  this  regard.  The  use  of  a candidate  knot  set  introduces  a form  of  buffering.  In  some 
circumstances  the  coalescing  of  knots  would  be  ideal  in  terms  of  the  resulting  closeness 
of  s to  the  data.  In  some  applications  the  loss  of  smoothness  would  be  unacceptable. 
Therefore,  whether  buffering  is  appropriate  depends  on  the  use  to  be  made  of  s. 

The  solution  may  not  be  unique.  Figure  6 shows  a set  of  201  uniformly  spaced  points 
in  [—1,1]  taken  from  f(x)  '=  sign(x)  min(x,  1/2).  Figure  7 shows  the  root-mean-square 
residual  as  a function  of  knot  location  for  t2  splines  of  order  4 with  one  interior  knot. 
There  are  two  best  approximants,  one  with  its  knot  at  x = —0.63  and  the  other  at 
x — +0.63.  One  of  the  two  approximants  is  shown  in  Figure  6.  The  other  spline  is  its 
skew-symmetric  counterpart. 


Raw  data  and  spline  frt 


Fig.  6.  201  uniformly  spaced  points  in  [—1,1]  taken  from  f(x)  = sign(x)  min(x,  1/2) 
and  a best  l2  spline  approximant  with  one  knot. 

It  is  rarely  required  to  determine  an  i2  spline  approximant  that  is  globally  or  even 
locally  optimal  with  respect  to  its  knots.  An  approximant  that  met  some  closeness  re- 
quirement with  the  smallest  possible  number  of  knots  is  an  academic  rather  than  a 
pragmatic  objective.  Today,  the  more  important  consideration  is  to  obtain  an  approxim- 
ant that  represents  the  data  in  that  its  smoothness  is  consistent  with  that  of  the  function 
underlying  the  data  and  the  uncertainties  in  the  data.  (This  statement  must  be  qual- 
ified for  situations  where  the  continuity  class  of  splines  is  a consideration  as  discussed 
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Fig.  7.  The  root-mean-square  residual  as  a function  of  knot  location  for  £ 2 spline  ap- 
proximants with  one  knot  to  the  data  of  Figure  6. 

above.)  These  ends  may  be  achieved  by  seeking  an  approximant  with  a reasonable  but 
not  necessarily  optimal  number  of  knots. 

The  use  of  knot  removal  strategies  is  likely  to  attract  research  effort  in  the  future. 
One  reason  for  this  statement  is  that  the  need  to  work  with  large  initial  knot  sets  is 
not  as  computationally  prohibitive  with  today’s  powerful  personal  and  other  computers. 
Another  reason  is  that  the  approach  can  be  expected  to  produce  better  approximants, 
i.e. , smaller  £2  errors  for  the  same  number  of  knots. 

The  two  sets  of  crosses  in  Figure  5 correspond  to  the  values  of  the  root-mean-square 
residual  as  a function  of  the  number  of  knots  for  the  application  of  the  knot-insertion 
strategy  followed  by  the  knot-removal  strategy  for  the  thermophysical  data  of  Figure  3. 
The  two  sets,  where  the  “progress”  takes  place  from  left  to  right  along  the  “top  set”, 
followed  by  right  to  left  along  “the  bottom  set”,  constitutes  a form  of  hysteresis.  The 
behaviour  in  the  two  directions  is  distinctly  different.  In  particular,  the  figure  indicates 
that  once  an  acceptable  approximation  has  been  obtained  by  knot  insertion,  the  use 
of  knot  removal  can  deliver  an  approximation  of  comparable  quality  with  many  fewer 
knots  or  alternatively  for  the  same  number  of  knots  an  appreciably  better  approximation 
can  be  obtained.  In  this  case,  with  30  knots,  knot  removal  gives  an  £ 2 error  that  is  one 
quarter  of  that  for  knot  insertion.  For  an  £2  error  of  0.005,  30  knots  are  required  using 
knot  removal  and  43  using  knot  insertion. 

Large  data  sets,  as  are  now  frequently  being  produced  in  metrology  from  computer- 
controlled  measuring  systems,  are  ideal  for  the  purpose  of  obtaining  a sound  initial 
approximant  in  the  form  of  a valid  model  containing  possibly  many  more  knots  than 
the  minimum  possible.  Their  size  permits  initial  approximants  to  be  obtained,  even  with 
large  numbers  of  uniformly  spaced  knots,  that  provide  valid  but  highly  redundant  models 
for  the  data.  The  fact  that  such  sets  do  not  contain  “appreciable  gaps” , because  of  the 
manner  in  which  they  gathered,  means  that  this  fact  together  with  the  quantity  of  data 
far  outweighing  this  initial  number  of  knots  goes  a long  way  towards  ensuring  that  this 
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initial  approximant  is  valid.  There  is  much  scope  for  an  appreciable  number  of  knots  to 
be  removed.  The  initial  large  number  of  knots  may  also  have  been  obtained  by  the  use 
of  a knot  insertion  strategy.  It  is  the  experience  of  the  authors  that  knot  insertion  can 
introduce  appreciably  more  knots  than  given  by  the  optimal  choice. 

Because  the  early  approximants  may  be  far  from  optimal,  an  insertion  algorithm 
can  produce  knots  that  are  totally  different  from  those  in  an  optimal  approximant.  In 
contrast,  a knot  removal  algorithm  has  a possibility  to  obtain  good  knots.  (See  Section 
8.10.)  For  instance,  because  of  the  sequential  manner  in  which  knots  are  inserted,  there 
may  be  two  or  more  close  or  even  coincident  knots,  although  a good  knot  set  might  not 
have  this  property.  It  is  also  possible  that  such  knots,  although  not  part  of  an  optimal 
set,  are  influential  in  their  effect  on  a knot  removal  algorithm,  with  the  result  that  they 
appear  in  the  “final”  approximant. 

The  problem  of  data  containing  wild  points  is  not  addressed  satisfactorily  by  existing 
knot  placement  algorithms.  Because  such  points  are  responsible  for  a large  contribution 
to  the  62  error,  more  knots  would  be  placed  in  the  neighbourhood  of  such  a point  than 
would  otherwise  had  been  done.  The  knot  placement  strategy  can  then  be  influenced 
more  by  the  errors  in  the  data  than  by  the  properties  of  the  underlying  function.  For- 
mulations and  hence  algorithms  are  needed  that  have  greater  resilience  to  such  effects. 

In  solving  the  fixed-knot  spline  approximation  problem  as  part  of  the  free-knot  prob- 
lem, a knot  set  differs  from  a previous  knot  set  only  by  the  addition  or  removal  of  a 
small  number  of  knots.  In  linear  algebraic  terms  the  “new”  matrix  A (A'),  say,  differs  in 
only  a few  rows  from  the  previous  matrix  /1(A).  Considerable  gains  in  computational 
efficiency  can  be  obtained  by  accounting  for  this  fact.  This  paper  has  not  addressed  this 
issue,  concentrating  more  on  the  concepts  in  the  area.  There  is  much  scope,  however, 
for  the  application  of  the  recognised  stable  updating  and  downdating  techniques  of  lin- 
ear algebra  [17].  Their  application  will  not  reduce  the  computational  complexity  of  a 
procedure,  but  could  reduce  computation  times  for  large  problems  by  an  appreciable 
factor. 

The  work  described  here  was  supported  by  the  National  Measurement  System  Policy 
Unit  of  the  UK  Department  of  Trade  and  Industry  as  part  of  its  NMS  Software  Support 
for  Metrology  programme.  The  referee  provided  carefully  considered  comments  that 
permitted  the  paper  to  be  improved. 
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